GenericBioMatch: A novel generic pattern match algorithm for biological sequences

نویسندگان

  • Youlian Pan
  • Fazel Famili
چکیده

GenericBioMatch is a novel algorithm for exact match in biological sequences. It allows the sequence motif pattern to contain one or more wild card letters (eg. Y, R, W in DNA sequences) and one or more gaps of any number of bases. GenericBioMatch is a relatively fast algorithm as compared to probabilistic algorithms, and has very little computational overhead. It is able to perform exact match of protein motifs as well as DNA motifs. This algorithm can serve as a quick validation tool for implementation of other algorithms, and can also serve as a supporting tool for probabilistic algorithms in order to reduce computational overhead. This algorithm has been implemented in the BioMiner software (http://iititi.nrc-cnrc.gc.ca/biomine_e.trx), a suite of java tools for integrated data mining in genomics. It has been tested successfully with DNA sequences from human, yeast, and Arabidopsis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM

Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...

متن کامل

A Novel Fuzzy Based Method for Heart Rate Variability Prediction

Abstract In this paper, a novel technique based on fuzzy method is presented for chaotic nonlinear time series prediction. Fuzzy approach with the gradient learning algorithm and methods constitutes the main components of this method. This learning process in this method is similar to conventional gradient descent learning process, except that the input patterns and parameters are stored in mem...

متن کامل

Finding Exact and Solo LTR-Retrotransposons in Biological Sequences Using SVM

Finding repetitive subsequences in genome is a challengeable problem in bioinformatics research area. A lot of approaches have been proposed to solve the problem, which could be divided to library base and de novo methods. The library base methods use predetermined repetitive genome’s subsequences, where library-less methods attempt to discover repetitive subsequences by analytical approach...

متن کامل

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

شناسایی RNA های غیرکدکننده کوتاه ‌عملکردی با استفاده از روش های بیوانفورماتیکی در گوسفند و بز

MicroRNAs (miRNAs) are small non-coding RNAs that have functional roles in post-transcriptional modification. They regulate gene expression by an RNA interfering pathway through cleavage or inhibition of the translation of target mRNA. Numerous miRNAs have been described for their important functions in developmental processes in numerous animals, but there is limited information about sheep an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003